XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology

نویسندگان

  • Eneko Agirre
  • Iñaki Alegria
  • Xabier Arregi
  • Xabier Artola
  • Arantza Díaz de Ilarraza
  • Montse Maritxalar
  • Kepa Sarasola
  • Miriam Urkia
چکیده

The application of the formalism of two-level morphology to Basque and its use in the e laborat ion of the X U X E N s p e l l i n g checker/corrector are described. This application is intended to cover a large part of the language. Because Basque is a highly inflected language, the approach of spelling checking and correction has been conceived as a by-product of a general purpose morphological analyzer/generator. This analyzer is taken as a basic tool for current and future work on automatic processing of Basque. An extens ion for cont inuat ion c l a s s specifications in order to deal with long-distance dependencies is proposed. This extension consists basically of two features added to the standard formalism which allow the lexicon builder to make explicit the interdependencies of morphemes. User-lexicons can be interactively enriched with new entries enabling the checker from then on to recognize all the possible flexions derived from them. Due to a late process of standardization of the language, writers don't always know the standard form to be used and commit errors. The treatment of these "typical errors" is made in a specific way by means of describing them using the two-level lexicon system. In this sense, XUXEN is intended as a useful tool for standardization purposes of present day written Basque.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology

The application of the formalism of two-level morphology to Basque and its use in the elaboration of the XUXEN spell ing checker/corrector are described. This application is intended to cover a large part of the language. Because Basque is a highly inflected language, the approach of spelling checking and correction has been conceived as a by-product of a general purpose morphological analyzer/...

متن کامل

A Morphological Analysis Based Method for Spelling Correction

Xuxen is a spelling checker/corrector for Basque which is going to be comercialized next year. The checker recognizes a word-form if a correct morphological breakdown is allowed. The morphological analysis is based on two-level morphology. The correction method distinguishes between orthographic errors and typographical errors. • Typographical errors (or misstypings) are uncognitive errors whic...

متن کامل

Designing spelling correctors for inflected languages using lexical transducers

This paper describes the components used in the design of the commercial X u x e n I I spelling checker/corrector for Basque. It is a new version of the Xuxen spelling corrector (Aduriz et al., 97) which uses lexical transducers to improve the process. A very important new feature is the use of user dictionaries whose entries can recognise both the original and inflected forms. In languages wit...

متن کامل

Spelling Correction: from Two-Level Morphology to Open Source

Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use...

متن کامل

Using Finite State Technology in Natural Language Processing of Basque

This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992